Asymptotic optimal control of multi-class restless bandits
نویسنده
چکیده
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit is acontrollable process whose state evolution depends on whether or not the bandit is made active. Theaim is to find a control that determines at each decision epoch which bandits to make active in orderto minimize the overall average cost associated to the states the bandits are in. Since finding theoptimal control is typically intractable, we study an asymptotic regime instead that is obtained byletting the number of bandits that can be simultaneously made active grow proportionally with thepopulation of bandits. We consider both a fixed population of bandits as well as a dynamic populationof bandits where bandits can depart and new bandits can arrive over time to the system. We proposea class of priority policies, obtained by solving a linear program, that are proved to be asymptoticallyoptimal under a global attractor property and a technical condition. Indexability of the bandits is notrequired for the result to hold. For a fixed population of bandits, the technical condition reduces tochecking a unichain property. For a dynamic population of bandits we present a large class of restlessbandit problems for which the technical condition is always satisfied. As an example, we present amulti-class M/M/S+M queue, which is inside this class of problems and satisfies the global attractorproperty. Henceforth asymptotic optimality of an index policy follows.In case the bandits are indexable, we prove that Whittle’s index policy is included in the class ofasymptotically optimal policies. This generalizes the result of Weber and Weiss (1990), who showedasymptotic optimality of Whittle’s index policy for a symmetric fixed population of bandits, to thesetting of (i) several classes of bandits, (ii) multiple actions, and (iii) possible arrivals of new bandits.In order to prove the main results we combine fluid-scaling techniques with linear programmingresults. This is a different proof approach than that taken in Weber and Weiss, and, in contrary tothe latter, allows to include arrivals of new bandits to the system.
منابع مشابه
Asymptotically optimal priority policies for indexable and non-indexable restless bandits
We study the asymptotic optimal control of multi-class restless bandits. A restless bandit isa controllable stochastic process whose state evolution depends on whether or not the bandit ismade active. Since finding the optimal control is typically intractable, we propose a class of prioritypolicies that are proved to be asymptotically optimal under a global attractor property an...
متن کاملA Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements
We generalise classical multi-armed and restless bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch provided they do not consume more resource than is...
متن کاملMulti - armed restless bandits , index policies , and dynamic priority allocation
This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision process models for optimal dynamic priority allocation to a collection of stochastic binary-action (active/passive) projects evolving over time. Interest in MARBs has grown steadil...
متن کاملAsymptotically optimal index policies for an abandonment queue with convex holding cost
We investigate a resource allocation problem in a multi-class server with convex holding costs and user impatience under the average cost criterion. In general, the optimal policy has a complex dependency on all the input parameters and state information. Our main contribution is to derive index policies that can serve as heuristics and are shown to give good performance. Our index policy attri...
متن کامل